Harmonizing word alignments and syntactic structures for extracting phrasal translation equivalents
نویسندگان
چکیده
Accurate identification of phrasal translation equivalents is critical to both phrase-based and syntax-basedmachine translation systems. We show that the extraction of many phrasal translation equivalents is made impossible by word alignments done without taking syntactic structures into consideration. To address the problem, we propose a new annotation scheme where word alignment and the alignment of non-terminal nodes (i.e., phrases) are done simultaneously to avoid conflicts between word alignments and syntactic structures. Relying on this new alignment approach, we construct a Hierarchically Aligned Chinese-English Parallel Treebank (HACEPT), and show that all phrasal translation equivalents can be automatically extracted based on the phrase alignments in HACEPT.
منابع مشابه
Collocational Translation Memory Extraction Based on Statistical and Linguistic Information
In this paper, we propose a new method for extracting bilingual collocations from a parallel corpus to provide phrasal translation memories. The method integrates statistical and linguistic information to achieve effective extraction of bilingual collocations. The linguistic information includes parts of speech, chunks, and clauses. The method involves first obtaining an extended list of Englis...
متن کاملTailoring Word Alignments to Syntactic Machine Translation
Extracting tree transducer rules for syntactic MT systems can be hindered by word alignment errors that violate syntactic correspondences. We propose a novel model for unsupervised word alignment which explicitly takes into account target language constituent structure, while retaining the robustness and efficiency of the HMM alignment model. Our model’s predictions improve the yield of a tree ...
متن کاملEnriching Source for English-to-Urdu Machine Translation
This paper focuses on the generation of case markers for free word order languages that use case markers as phrasal clitics for marking the relationship between the dependentnoun and its head. The generation of such clitics becomes essential task especially when translating from fixed word order languages where syntactic relations are identified by the positions of the dependent-nouns. To addre...
متن کاملJohan Segura and Violaine Prince Using Alignment to detect associated multiword expressions in bilingual corpora
Translating multiword expressions from a language to another needs to recognize them as such. Bilingual multiword expressions are an issue when they are not the exact word-toword translation of each other. The following examples are provided for a French-English translation task: (1) Phrasal verbs such as « to call in on » becoming « rendre visite », (2) « sorry to hear that », that a human tra...
متن کاملImproving Function Word Alignment with Frequency and Syntactic Information
In statistical word alignment for machine translation, function words usually cause poor aligning performance because they do not have clear correspondence between different languages. This paper proposes a novel approach to improve word alignment by pruning alignments of function words from an existing alignment model with high precision and recall. Based on monolingual and bilingual frequency...
متن کامل